Robust Computer Vision ROBUST COMPUTER VISION Theory and Applications

نویسنده

  • MICHAEL S. LEW
چکیده

non-real) objects: Plane geometric forms, solid geometric forms, and projected forms. The first class is the “real" class consisting of objects from the real world. The second class are representations of real objects. The third class are abstractions that can be represented using symbols but do not correspond to real objects (because they have no corresponding stimulus in the real world). Marr et al. [Marr, 1976; Marr and Poggio, 1979; Marr, 1982] made significant contributions to the study of the human visual perception system. In Marr’s paradigm [Marr, 1982], the focus of research is shifted from applications to topics corresponding to modules of the human visual system. An illustration of this point is the so-called shape from x research which represents an important part of the total research in computer vision [Aloimonos, 1988]. Papers dealing with shape from x techniques include: shape from shading [Zhang et al., 1999], shape from contour [Horaud and Brady, 1988], shape from texture [Malik and Rosenholtz, 1997], shape from stereo [Hoff and Ahuja, 1989], and shape from fractal geometry [Chen et al., 1990]. In [Marr, 1976] Marr developed a primal sketch paradigm for early processing of visual information. In his method, a set of masks is used to measure discontinuities in first and second derivatives of the original image. This information is then processed by subsequent procedures to create a primal sketch of the scene. The primal sketch contains locations of edges in the image and is used by subsequent stages of the shape analysis procedure. Marr and Hildreth [Marr and Hildreth, 1980] further developed the concept of the primal sketch and proposed a new edge detection filter based on the zero crossings of the Laplacian of the two-dimensional Gaussian distribution function. In this approach, zeros of Laplacian indicate the inflection point in the edge to detect edge positions. 116 Shape Based Retrieval Koenderink [Koenderink, 1984] and Koenderink and van Doorn [Koenderink and Van Doorn, 1986] have studied the psychological aspects of visual perception and proposed several interesting paradigms. Conventional approaches to shape are often static in the sense that they treat all shape details equally as global shape features [Koenderink and Van Doorn, 1986]. A dynamic shape model was developed where visual perception is performed on several scales of resolution. Such notions of order and relatedness are present in visual psychology and are absent in conventional geometrical theories of shape. It has been argued in [Koenderink and Van Doorn, 1986] that there exist manuals of art theory (such as [Gombrich, 1960]) which have not been given the attention they deserve and which contain practical knowledge accumulated over centuries. In art as well as in perception, a shape is viewed as a hierarchical structure. A procedure for morphogenesis based on multiple levels of resolution has been developed [Koenderink and Van Doorn, 1986]. Any shape can be embedded in a “morphogenetic sequence" based on the solution of the partial differential equation that describes the evolution of the shape through multiple resolutions. Many authors agree on the significance of high curvature points for visual perception. Attneave [Attneave, 1954] performed psychological experiments to investigate the significance of corners for perception. In the famous Attneave’s cat experiment a drawing of a cat was used to locate points of high curvature which were then connected to create a simplified drawing of the cat. After a brief presentation the cat could be reliably recognized in the drawing. It has been suggested that such points have high information content. Attneave’s work has initiated further research on the topic of curve partitioning [Wuescher and Boyer, 1991; Fischler and Wolf, 1994; Katzir et al., 1994]. To approximate curves by straight lines, high curvature points are the best place to break the lines, thereby the resulting image retains the maximal amount of information necessary for successful shape recognition. For the purpose of shape description, corners are used as points of high curvature and the shape can be approximated by a polygon. Davis [Davis, 1977] combined the use of high curvature points and line segment approximations for polygonal shape approximations. Stokely and Wu [Stokely and Wu, 1992] investigated methods for measurement of the curvature of 3-D surfaces that evolve in many applications (e.g. tomographic medical images). Hoffman and Richards [Hoffman and Richards, 1984] argue that when the visual system decomposes objects it does so at points of high negative curvature. This agrees with the principle of transversality [Guillemin and Pollack, 1974] found in nature. This principle contends that when two arbitrarily shaped convex objects interpenetrate each other, the meeting point is a boundary point of concave discontinuity of their tangent planes. Human Perception of Visual Form 117 Leyton [Leyton, 1987] demonstrated the Symmetry-Curvature theorem which claims that any curve section that has only one curvature extremum has one and only one symmetric axis which terminates at the extremum itself. This is an important result because it establishes the connection between two important notions in visual perception. In [Leyton, 1989], Leyton developed a theory which claims that all shapes are basically circles which changed form as a result of various deformations caused by external forces like pushing, pulling, stretching, etc. Two problems were considered: the first was the inference of the shape history from a single shape, and the second was the inference of shape evolution between two shapes. The concept of symmetry-curvature was used to explain the process that deformed the object. Symmetric axes show the directions along which a deformation process most likely took place. In [Leyton, 1987], Leyton proposed a theory of nested structures of control which, he argues, governs the functioning of the human perceptual system. It is a hierarchical system where at each level of control all levels bellow any given level are also included in information processing. Pentland [Pentland, 1984; Pentland, 1986] investigated methods for representation of natural forms by means of fractal geometry. He argued that fractal functions are appropriate for natural shape representation because many physical processes produce fractal surface shapes. This is due to the fact that natural forms repeat whenever possible and non-animal objects have a limited number of possible forms [Stevens, 1974]. Most existing schemes for shape representation were developed for engineering purposes and not necessarily to study perception. Fractal representations produce objects which correspond much better to the human model of visual perception and cognition. Lowe [Lowe, 1987] proposed a computer vision system that can recognize three-dimensional objects from unknown viewpoints and single twodimensional images. The procedure is non-typical and uses three mechanisms of perceptual grouping to determine three-dimensional knowledge about the object as opposed to a standard bottom-up approach. The disadvantage of bottom-up approaches is that they require an extensive amount of information to perform recognition of an object. Instead, the human visual system is able to perform recognition even with very sparse data or partially occluded objects. The conditions that must be satisfied by perceptual grouping operations are the following. The viewpoint invariance condition. This means that observed primitive features must remain stable over a range of viewpoints. The detection condition. There must be enough information available to avoid accidental mis-interpretations. The grouping operations used by Lowe are the following. Grouping on the basis of proximity of line end points was used as one viewpoint invariant op118 Shape Based Retrieval eration. The second operation was grouping on the basis of parallelism, which is also viewpoint independent. The third operation was grouping based on collinearity. The preprocessing operation consisted of edge detection using Marr’s zero crossings in the image convolved with a Laplacian of Gaussian filter. In the next step a line segmentation was performed. Grouping operations on line-segmented data were performed to determine possible locations of objects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast, Robust, Automatic Blink Detector

Introduction “Blink” is defined as closing and opening of the eyes in a small duration of time. In this study, we aimed to introduce a fast, robust, vision-based approach for blink detection. Materials and Methods This approach consists of two steps. In the first step, the subject’s face is localized every second and with the first blink, the system detects the eye’s location and creates an ope...

متن کامل

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...

متن کامل

Human Computer Interaction Using Vision-Based Hand Gesture Recognition

With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...

متن کامل

Computer assisted instruction during quarantine and computer vision syndrome

Computer vision syndrome (CVS) is a set of visual, ocular, and musculoskeletal symptoms that result from long-term computer use. These symptoms include eyestrain, dry eyes, burning, pain, redness, blurred vision, etc, which increase with the duration of computer use. Currently, with the closure of schools and universities due to the continued COVID19 pandemic many universities have taken the pr...

متن کامل

Robot Motion Vision Pait I: Theory

A direct method called fixation is introduced for solving the general motion vision problem, arbitrary motion relative to an arbitrary environment. This method results in a linear constraint equation which explicitly expresses the rotational velocity in terms of the translational velocity. The combination of this constraint equation with the Brightness-Change Constraint Equation solves the gene...

متن کامل

Research Summary

My research interest in computer vision includes human activity recognition, human motion analysis, tracking, human identification, gait and gesture recognition, statistical methods for computer vision. I mainly focus on developing robust real-time algorithms based on sound theory for solving realistic computer vision problems for many applications. My research is widely applicable in visual su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003